Detection of Language (Model) Errors
نویسندگان
چکیده
The bigram language models are popular, in much language processing applications, in both Indo-European and Asian languages. However, when the language model for Chinese is applied in a novel domain, the accuracy is reduced significantly, from 96% to 78% in our evaluation. We apply pattern recognition techniques (i.e. Bayesian, decision tree and neural network classifiers) to discover language model errors. We have examined 2 general types of features: modelbased and language-specific features. In our evaluation, Bayesian classifiers produce the best recall performance of 80% but the precision is low (60%). Neural network produced good recall (75%) and precision (80%) but both Bayesian and Neural network have low skip ratio (65%). The decision tree classifier produced the best precision (81%) and skip ratio (76%) but its recall is the lowest (73%).
منابع مشابه
Design and implementation of Persian spelling detection and correction system based on Semantic
Persian Language has a special feature (grapheme, homophone, and multi-shape clinging characters) in electronic devices. Furthermore, design and implementation of NLP tools for Persian are more challenging than other languages (e.g. English or German). Spelling tools are used widely for editing user texts like emails and text in editors. Also developing Persian tools will provide Persian progr...
متن کاملارائه یک رتبهبند برای خطایاب معنایی با استفاده از ویژگیهای حساس به متن
Nowadays, a large volume of documents is generated daily. These documents generated by different persons, thus, the documents contain spelling errors. These spelling errors cause quality of the documents are decrease. Therefore, existence of automatic writing assistance tools such as spell checker/corrector can help to improve their quality. Context-sensitive are misspelled words that have been...
متن کاملTranslation Quality Assessment of English Equivalents of Persian Proper Nouns: A case of bilingual tourist signposts in Isfahan
Abstract This study evaluated the translation quality of English equivalents of Persian proper nouns in the tourist signs and bilingual boards in Isfahan. To find different errors in the translations of the bilingual boards and tourist signs, the data were collected directly by taking picture or writing exactly from the available tourist signs and bilingual boards. Then, the errors were assesse...
متن کاملTranslation Quality Assessment of English Equivalents of Persian Proper Nouns: A case of bilingual tourist signposts in Isfahan
Abstract This study evaluated the translation quality of English equivalents of Persian proper nouns in the tourist signs and bilingual boards in Isfahan. To find different errors in the translations of the bilingual boards and tourist signs, the data were collected directly by taking picture or writing exactly from the available tourist signs and bilingual boards. Then, the errors were assesse...
متن کاملAssessing the Quality of Persian Translation of Orwell’s Nineteen Eighty-Four Based on House’s Model: Overt-Covert Translation Distinction
This study aimed to assess the quality of Persian translation of Orwell's (1949) Nineteen Eighty-Four by Balooch (2004) based on House's (1997) model of translation quality assessment. To do so, 23 pages (about 10 percent) of the source text were randomly selected. The profile of the source text register was produced and the genre was realized. The source text profile was compared to t...
متن کاملAssessing the Quality of Persian Translation of Orwell’s Nineteen Eighty-Four Based on House’s Model: Overt-Covert Translation Distinction
This study aimed to assess the quality of Persian translation of Orwell's (1949) Nineteen Eighty-Four by Balooch (2004) based on House's (1997) model of translation quality assessment. To do so, 23 pages (about 10 percent) of the source text were randomly selected. The profile of the source text register was produced and the genre was realized. The source text profile was compared to t...
متن کامل